Skip to content

Non-record: 30ep Cosine TTT on SwiGLU + U-Net (1xH100, val_bpb=1.1175)#661

Closed
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/cosine-ttt-30ep-v2
Closed

Non-record: 30ep Cosine TTT on SwiGLU + U-Net (1xH100, val_bpb=1.1175)#661
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/cosine-ttt-30ep-v2

Conversation

@andrewbaggio1
Copy link
Copy Markdown

Summary

Non-record 1xH100 submission. Single change from PR #462: TTT_EPOCHS=30 (default 10).

val_bpb = 1.1175 (sliding window stride=64, seed 1337) | 7.5 MB artifact | 1xH100 SXM

Results (1xH100 SXM, seed 1337)

Metric Value
Training steps 936 (wallclock capped)
Post-quant roundtrip val_bpb 1.0684
Sliding window val_bpb 1.1175
Artifact size 7.5 MB
TTT time 3,376s (30 epochs cosine)

Approach

PR #462's full SwiGLU + U-Net architecture with 30-epoch cosine TTT (vs default 10). Consistent with PR #481's finding that more cosine TTT epochs improve results.

Architecture

PR #462's stack unchanged: 11L SwiGLU (hidden=1792), U-Net gated skips, BigramHash (8192), SmearGate, EMA (0.9985), Late QAT, Partial RoPE, LN Scale, Int6+zstd.

Limitation

1xH100 only — needs 8xH100 verification. 30 TTT epochs estimated ~7 min on 8xH100 (within eval budget).

Credits

PR #462 (JoeProAI), PR #481 (mrdavtan), PR #442 (sjp611), PR #398 (felipe-parodi)

Test plan

  • train_gpt.py compiles (ast.parse passes)
  • Artifact under 16 MB (7.5 MB)
  • PR only adds files to one new folder
  • submission.json includes all required fields
  • Train log included
  • Pending: 8xH100 verification + additional seeds

🤖 Generated with Claude Code

…pb=1.1175)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@andrewbaggio1
Copy link
Copy Markdown
Author

Closing — superseded by #672 (1.0781 BPB vs 1.1175 BPB). Reducing reviewer burden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant